perf(crypto): specialize keccak256 for the fixed-shape Merkle parent hash by Oppen · Pull Request #774 · yetanotherco/lambda_vm

Oppen · 2026-07-03T18:49:50Z

Summary

Specializes hash_new_parent (Merkle parent hash — always exactly 64 bytes, two 32-byte nodes) for Keccak256/32-byte nodes to a single hand-rolled keccak-f[1600] permutation, bypassing sha3's generic incremental-Digest/block_buffer machinery. Dispatch is a TypeId-based compile-time-constant check (D == Keccak256 && NUM_BYTES == 32); every other digest/size falls through unchanged to the original generic path.
Also fast-paths FieldElementPairBackend::hash_data (always exactly 2 small field elements, always sub-rate) the same way. Note: this backend is used prover-side only (FriLayerMerkleTreeBackend) — the guest-side cycle win below comes entirely from the hash_new_parent change.
Deliberately does not fast-path FieldElementVectorBackend::hash_data (wide trace-row leaves): measured that a "try one block, else fall back" attempt is a net regression, since real rows always exceed the 136-byte keccak rate and the attempt is pure wasted overhead. Left on the generic path, with a comment explaining why.
Tier 1 of the keccak-cost-reduction work scoped in .refs/merkle-keccak_handoff.md; tier 2 (routing the permutation through the VM's keccak precompile) is explicitly out of scope here — it needs its own outer-proving-cost measurement gate.

Measurements (recursion verifier guest, cycle-accurate)

	single-query	multi-query
baseline (`perf/rkyv-serialization`)	89,721,844	2,210,366,539
this PR	89,081,698	2,152,039,894
delta	−640,146 (−0.7%)	−58,326,645 (−2.6%)

In-VM correctness verified at every step: the guest's committed 32-byte output digest is byte-identical to baseline across all intermediate versions of this change.

Review

Went through two rounds of adversarial review (performance, correctness, cryptographic soundness, implementation simplicity) against this exact diff. First round surfaced real issues (parent-hash fast path was doing a redundant buffer copy and wasn't being inlined across the crate boundary; test coverage only pinned the inner permutation helper, not the dispatch condition itself) — all fixed and re-measured. Second round came back clean: no correctness or soundness defects, byte-for-byte lane/padding arithmetic independently verified against the Keccak spec, prover/verifier path-selection symmetry confirmed, fail-safe TypeId-mismatch fallback re-verified.

Test plan

cargo test --workspace --exclude math-cuda (math-cuda needs a GPU, not available here)
make test-ethrex
cargo test -p crypto -p stark
New unit tests: byte-identity of the hand-rolled permutation against sha3::Keccak256 (1000 random 64-byte pairs, all sub-rate lengths 0..135), and dispatch-level pinning tests comparing the trait-method output (not just the inner helper) against an independent reference
cargo test -p lambda-vm-prover --lib test_recursion_execute_1query -- --ignored --nocapture — in-VM verify accepted, output digest unchanged from baseline
make test-profile-recursion-single / -multi — cycle counts above

hash_new_parent always hashes exactly 64 bytes (two 32-byte nodes), which fits the keccak rate in one block. Skip sha3's block_buffer/Digest machinery and run keccak-f[1600] directly for the Keccak256/32-byte case; fall back to the generic Digest path for every other backend instantiation. Cuts ~23M cycles (~1%) off the recursion verifier guest's multi-query profile.

Address adversarial-review findings on the previous commit: - hash_new_parent's fast path now builds the keccak state directly from left/right (no intermediate 136-byte buffer, no owned-array copies) and is #[inline], so it collapses into callers instead of costing a real cross-crate call plus redundant copies. - FieldElementPairBackend::hash_data (always 2 small elements, always single-block) gets the same single-permutation treatment. - FieldElementVectorBackend::hash_data deliberately keeps the plain multi-block Digest path: its inputs are whole trace rows, which always exceed the keccak rate, so a "try one block, else fall back" attempt measured as a net regression (all cost, no payoff) rather than a win. - Added tests pinning the TypeId dispatch itself (not just the inner permutation helpers) against an independent sha3 reference. Cuts guest cycles further: single-query 89,632,723 -> 89,081,698, multi-query 2,187,109,919 -> 2,152,039,894 (vs. original baseline 89,721,844 / 2,210,366,539 -- roughly -0.7% / -2.6% total).

Oppen added 2 commits July 3, 2026 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(crypto): specialize keccak256 for the fixed-shape Merkle parent hash#774

perf(crypto): specialize keccak256 for the fixed-shape Merkle parent hash#774
Oppen wants to merge 2 commits into
perf/rkyv-serializationfrom
perf/merkle-block-keccak

Oppen commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Oppen commented Jul 3, 2026

Summary

Measurements (recursion verifier guest, cycle-accurate)

Review

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant